SHAP 特征交互

🔖 interpretability
🔖 machine learning
Author

Guangyao Zhao

Published

May 3, 2023

ref: Interpretable machine learning

1 The shapely value

The Shapley values have the same shap as the feature values. In other words, each feature of individuals has a corresponding Shapley value.

\[ \phi_{j}(v a l)=\sum_{S \subseteq\left\{x_{1}, \ldots, x_{p}\right\} \backslash\left\{x_{j}\right\}} \frac{|S| !(p-|S|-1) !}{p !}\left(\operatorname{val}\left(S \cup\left\{x_{j}\right\}\right)-\operatorname{val}(S)\right) \tag{1}\]

  • \(j\): feature index.
  • \(S\): subset of the features.
  • \(x\): the vector of feature values of the instance to be explained.
  • \(|S|\): the number of subset features.
  • \(p\): the number of features.
  • \(val(S)\): prediction for feature values in set \(S\).
  • \(\frac{|S| !(p-|S|-1) !}{p !}\): weight.
  • \(S \subseteq\left\{x_{1}, \ldots, x_{p}\right\} \backslash\left\{x_{j}\right\}\): S is the subset of \(\left\{x_{1}, \ldots, x_{p}\right\}\), but not include feature \({x_{j}}\).
  • computational complexity: \(2^{p}\) (see followings).

If the dataset has four features \(\{x_1,x_2,x_3,x_4\}\), now we want to know the feature importance of \(\{x_3\}\). When compute weight, the denominator is \(4!=24\).

1.1 \(|S|=0\)

When \(S=\emptyset\): the feature sets are \(\emptyset\) and \(\{x_3\}\),and numerator is \(0!(4-0-1)!)=6\).

1.2 \(|S|=1\)

  • \(S=\{x_1\}\), \(S=\{x_1, x_3\}\)
  • \(S=\{x_2\}\), \(S=\{x_2, x_3\}\)
  • \(S=\{x_4\}\), \(S=\{x_4, x_3\}\)

and numerator is \(1!(4-1-1)!)=2\).

1.3 \(|S|=2\)

  • \(S=\{x_1,x_2\}\), \(S=\{x_1,x_2, x_3\}\)
  • \(S=\{x_1,x_4\}\), \(S=\{x_1,x_4, x_3\}\)
  • \(S=\{x_2,x_4\}\), \(S=\{x_2,x_4, x_3\}\)

and numerator is \(2!(4-2-1)!)=2\).

1.4 \(|S|=3\)

  • \(S=\{x_1,x_2,x_4\}\), \(S=\{x_1,x_2,x_4,x_3\}\)

and numerator is \(3!(4-3-1)!)=6\).

2 Interaction effect

\[ \phi_{i, j}=\sum_{S \subseteq \backslash\{i, j\}} \frac{|S| !(M-|S|-2) !}{2(M-1) !} \delta_{i j}(S) \tag{2}\]

when \(i\ne j\):

\[ \delta_{ij}(S)=f_x(S\cup\{i,j\})-f_x(S\cup\{i\})-f_x(S\cup\{j\})+f_x(S) \tag{3}\]

This formula subtracts the main effect of the features, so that we get the pure interaction effect after accounting for the individual effects. We average the values over all possible feature coalitions \(S\), as in the Shapley value computation.

When we compute SHAP interaction values for all features, we get one matrix per instance with dimensions \(M \times M\), where \(M\) is the number of features.

SHAP values = main effect + interaction effect + residual effect. Here main effect means feature without interaction, and the main effect values are at the diagonal entries of the interaction matrix.

3 Code

ref: _tree.py, Basic SHAP Interaction Value Example in XGBoost

For models with a single output this returns a tensor of SHAP values (samples, features, features). The matrix (features, features) for each sample sums to the difference between the model output for that sample and the expected value of the model output (which is stored in the expected_value attribute of the explainer).

Each row of this matrix sums to the SHAP value for that feature for that sample. The diagonal entries of the matrix represent the “main effect” of that feature on the prediction and the symmetric off-diagonal entries represent the interaction effects between all pairs of features for that sample. For models with vector outputs this returns a list of tensors, one for each output.

3.1 No interaction effect between each features

Import packages.

import xgboost
import numpy as np
import shap
import matplotlib.pyplot as plt

Simulate some binary data and a linear outcome with an interaction term, note we make the features in X perfectly independent of each other to make it easy to solve for the exact SHAP values.

N = 2000
X = np.zeros((N, 5))
X[:1000, 0] = 1
X[:500, 1] = 1
X[1000:1500, 1] = 1
X[:250, 2] = 1
X[500:750, 2] = 1
X[1000:1250, 2] = 1
X[1500:1750, 2] = 1
X[:, 0:3] -= 0.5
y = 2 * X[:, 0] - 3 * X[:, 1]
X.shape
(2000, 5)

Ensure the variables are independent.

np.cov(X.T)
array([[0.25012506, 0.        , 0.        , 0.        , 0.        ],
       [0.        , 0.25012506, 0.        , 0.        , 0.        ],
       [0.        , 0.        , 0.25012506, 0.        , 0.        ],
       [0.        , 0.        , 0.        , 0.        , 0.        ],
       [0.        , 0.        , 0.        , 0.        , 0.        ]])

Train a model with single tree.

Xd = xgboost.DMatrix(X, label=y)
model = xgboost.train({"eta": 1, "max_depth": 3, "base_score": 0, "lambda": 0}, Xd, 1)
# print("Model error =", np.linalg.norm(y - model.predict(Xd)))
# print(model.get_dump(with_stats=True)[0])

Make sure the SHAP values add up to marginal predictions.

pred = model.predict(Xd, output_margin=True)
explainer = shap.TreeExplainer(model)
shap_values = explainer.shap_values(Xd)

#! Sum up the SHAP values for each individual sample and
# check if the total is equal to the predicted value.
np.abs(shap_values.sum(1) + explainer.expected_value - pred).max()
0.0

If we build a summary plot we see that only features 1 and 2 have any effect, and that their effects only have two possible magnitudes (one for -0.5 and for 0.5).

shap.summary_plot(shap_values, X)  #! No interaction effect here.

Train a linear model, and check the main effect between the linear model and the previous XGBoost.

from sklearn import linear_model

lr = linear_model.LinearRegression()
lr.fit(X, y)
lr_pred = lr.predict(X)
lr.coef_.round(2)
array([ 2., -3.,  0.,  0.,  0.])

Make sure the computed SHAP values match the true SHAP values. We can compute the true SHAP values directly for this simple case.

main_effect_shap_values = lr.coef_ * (
    X - X.mean(0)
)  # Note that the meaning of main effect in the SHAP context.

#! Check the difference by norm value,
# and if this value near zero, the SHAP values is same as main effect value.
np.linalg.norm(shap_values - main_effect_shap_values)
2.2700086067334356e-13

Note that when there are no interactions present the SHAP interaction values are just a diagonal matrix with the SHAP values on the diagonal.

shap_interaction_values = explainer.shap_interaction_values(Xd)
shap_interaction_values[0]
array([[ 1. ,  0. ,  0. ,  0. ,  0. ],
       [ 0. , -1.5,  0. ,  0. ,  0. ],
       [ 0. ,  0. ,  0. ,  0. ,  0. ],
       [ 0. ,  0. ,  0. ,  0. ,  0. ],
       [ 0. ,  0. ,  0. ,  0. ,  0. ]], dtype=float32)

Ensure the SHAP interaction values sum to the marginal predictions. shap_interaction_values.shape = (samples, features,features)shap_interaction_values.sum((1,2)) means sum up the SHAP values of the individual sample, and if the performance of trained model is high enough, this value plus the expectation of total sample output (i.e, average) should be same as the prediction value.

(
    shap_interaction_values.sum((1, 2)) + explainer.expected_value - pred
)  # explainer.expected_value=0
array([0., 0., 0., ..., 0., 0., 0.], dtype=float32)

Ensure the main effects from the SHAP interaction values match those from a linear model.

dinds = np.diag_indices(shap_interaction_values.shape[1])
total = 0
for i in range(N):
    for j in range(5):
        total += np.abs(
            shap_interaction_values[i, j, j] - main_effect_shap_values[i, j]
        )
total
1.3146441563284285e-11

3.2 Explain a linear model with one interaction

Simulate some binary data and a linear outcome with an interaction term.

N = 2000
X = np.zeros((N, 5))
X[:1000, 0] = 1

X[:500, 1] = 1
X[1000:1500, 1] = 1

X[:250, 2] = 1
X[500:750, 2] = 1
X[1000:1250, 2] = 1
X[1500:1750, 2] = 1

X[:125, 3] = 1
X[250:375, 3] = 1
X[500:625, 3] = 1
X[750:875, 3] = 1
X[1000:1125, 3] = 1
X[1250:1375, 3] = 1
X[1500:1625, 3] = 1
X[1750:1875, 3] = 1
X[
    :, :4
] -= 0.4999  # we can't exactly mean center the data or XGBoost has trouble finding the splits
y = 2 * X[:, 0] - 3 * X[:, 1] + 2 * X[:, 1] * X[:, 2]

Non-zero values on the off-diagonal of the covariance matrix.

np.cov(X.T)
array([[ 2.50125063e-01,  7.10898185e-18,  3.39898195e-17,
        -5.46502980e-17,  0.00000000e+00],
       [ 7.10898185e-18,  2.50125063e-01,  5.82047889e-17,
         4.24317354e-17,  0.00000000e+00],
       [ 3.39898195e-17,  5.82047889e-17,  2.50125063e-01,
         6.50916151e-17,  0.00000000e+00],
       [-5.46502980e-17,  4.24317354e-17,  6.50916151e-17,
         2.50125063e-01,  0.00000000e+00],
       [ 0.00000000e+00,  0.00000000e+00,  0.00000000e+00,
         0.00000000e+00,  0.00000000e+00]])

Train a model with single tree.

Xd = xgboost.DMatrix(X, label=y)
model = xgboost.train({"eta": 1, "max_depth": 4, "base_score": 0, "lambda": 0}, Xd, 1)
print("Model error =", np.linalg.norm(y - model.predict(Xd)))
# print(model.get_dump(with_stats=True)[0])
Model error = 1.73650378306776e-06

Make sure the SHAP values add up to marginal predictions.

pred = model.predict(Xd, output_margin=True)
explainer = shap.TreeExplainer(model)
shap_values = explainer.shap_values(Xd)
np.abs(shap_values.sum(1) + explainer.expected_value - pred).max()
4.7683716e-07

If we build a summary plot we see that now only features 3 and 4 don’t matter, and that feature 1 can have four possible effect sizes due to interactions.

shap.summary_plot(shap_values, X)

Train a linear model and get the main effect.

lr = linear_model.LinearRegression()
lr.fit(X, y)
lr_pred = lr.predict(X)
lr.coef_.round(2)
array([ 2., -3.,  0.,  0.,  0.])

Note that the SHAP values no longer match the main effects because they now include interaction effects.

main_effect_shap_values = lr.coef_ * (X - X.mean(0))
np.linalg.norm(shap_values - main_effect_shap_values)
15.81138782962684

3.3 SHAP interaction values

SHAP interaction contributions:

shap_interaction_values = explainer.shap_interaction_values(Xd)
shap_interaction_values[0].round(2)
array([[ 1.  ,  0.  ,  0.  ,  0.  ,  0.  ],
       [ 0.  , -1.5 ,  0.25,  0.  ,  0.  ],
       [ 0.  ,  0.25,  0.  ,  0.  ,  0.  ],
       [ 0.  ,  0.  ,  0.  ,  0.  ,  0.  ],
       [ 0.  ,  0.  ,  0.  ,  0.  ,  0.  ]], dtype=float32)

Ensure the SHAP interaction values sum to the marginal predictions.

np.abs(shap_interaction_values.sum((1, 2)) + explainer.expected_value - pred).max()
4.7683716e-07

Ensure the main effects from the SHAP interaction values match those from a linear model. While the main effects no longer match the SHAP values when interactions are present, they do match the main effects on the diagonal of the SHAP interaction value matrix.

total = 0
for i in range(N):  # for each sample
    for j in range(5):  #! the diagonal value of the interaction matrix.
        total += np.abs(
            shap_interaction_values[i, j, j] - main_effect_shap_values[i, j]
        )  # main_effect_shap_values is from linear model.
print("total = ", total)
total =  0.0005347490549072596